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DETAILED ACTION 

Response to Arguments 

1 . Applicant's arguments, see Remarks page 15 last paragrpah, filed 1 1/23/2009, 
with respect to the rejection(s) of claim(s) 1-27 under 35 USC 103(a) have been fully 
considered and are persuasive. Therefore, the rejection has been withdrawn. 
However, upon further consideration, a new ground(s) of rejection is made in view of 
Kanevsky et al. US 6529902 (hereinafter Kanevsky). 

NOTE: Examiner would like to remind Applicant of the following: 

"USPTO personnel are to give claims their broadest reasonable interpretation in 
light of the supporting disclosure. In re Morris, 127 F.3d 1048, 1054-55, 
44 USPQ2d 1023,1027-28 (Fed. Cir. 1997). Limitations appearing in the 
specification but not recited in the claim should not be read into the claim. E-Pass 
Techs., Inc. v. 3Com Corp., 343 F.3d1364, 1369, 67 USPQ2d 1947, 1950 (Fed. 
Cir. 2003) (claims must be interpreted "in view of the specification" without 
importing limitations from the specification into the claims unnecessarily). In re 
Prater, 415F.2d 1393, 1404-05, 162 USPQ 541, 550-551 (CCPA 1969). See 
also In re Zletz, 893 F.2d 319, 321-22, 13 USPQ2d 1320, 1322 (Fed. Cir. 1989) 
("During patent examination the pending claims must be interpreted as broadly 
as their terms reasonably allow.... The reason is simply that during patent 
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prosecution when claims can be amended, ambiguities should be recognized, 
scope and breadth of language explored, and clarification imposed.... An 
essential purpose of patent examination is to fashion claims that are precise, 
clear, correct, and unambiguous. Only in this way can uncertainties of claim 
scope be removed, as much as possible, during the administrative process."). 
Where an explicit definition is provided by the applicant for a term, that definition 
will control interpretation of the term as it is used in the claim. Torn Co. v. White 
Consolidated Industries Inc., 199 F.3d 1295, 1301, 53 USPQ2d 1065, 1069 (Fed. 
Cir. 1999) (meaning of words used in a claim is not construed in a "lexicographic 
vacuum, but in the context of the specification and drawings."). Any special 
meaning assigned to a term "must be sufficiently clear in the specification that 
any departure from common usage would be so understood by a person of 
experience in the field of the invention." Multiform Desiccants Inc. v. Medzam 
Ltd., 133 F.3d 1473, 1477, 45 USF>Q2d 1429, 1432 (Fed. Cir. 1998). See also 
MPEP §2111.01." 

While giving claims their broadest reasonable interpretation in light of the supporting 
disclosure without importing limitations from the specification into the claims 
unnecessarily, Examiner believes Kanevsky to teach "the difference in model 
information between the phoneme models of the pair of corresponding phoneme 
models is insignificant", wherein a Kullback-Leibler distance is a well known method in 
establishing sufficient separation between various data groups. Examiner finds the 
Kullback-Leibler distance approach in light of the specification of the present invention, 
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such as page 7, which describes insignificant differences based on Kullback-Leibler 
distance. See below rejection with Kanevsky now incorporated. 

Neti already appears to establish modeled male and female gender differences using a 
confidence measure approach. Neti teaches a gender independent identification 
system containing gender independent probabilistic state codebooks, wherein the best 
distance is found reflecting the proper gender of an utterance when compared to a 
gender class (Neti Col. 6 lines 50-67). 

Kanevsky also explicitly teaches how a difference is sufficient, such as classifying data 
groups when compared, and also creating independence from classificaiton if there is 
no topic discovered (Kanevsky Col. 5 lines 8-25). 



Claim Rejections - 35 USC § 103 

2. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subj7ect matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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3. Claims 1-16 are rejected under 35 U.S.C. 103(a) as being unpatentable over Neti 
et al. US 5953701 A (hereinafter Neti) in view of Yang US 20010010039 A1 (hereinafter 
Yang) and further in view of Kanevsky et al. US 6529902 (hereinafter Kanevsky). 

Re claims 1, 6, 11, and 16, Neti teaches a method for generating speech 
recognition models, the method comprising: 

receiving a female speech recognition model of phoneme models based on the 
female set of recorded phonemes training data (Col. 5 lines 9-21, Fig. 4); 

receiving a male speech recognition model of phoneme models based on the 
male set of recorded phonemes training data (Col. 5 lines 9-21, Fig. 4); 

determining a difference in model information between pairs of corresponding 
phoneme models of the female speech recognition model and the male speech 
recognition model (Col. 5 lines 9-21); 

creating a gender-independent speech recognition model that includes a gender- 
independent phoneme model based on if a pair of corresponding phoneme models of 
the female speech recognition model and the male speech recognition model (Col. 5 
lines 9-21) when the difference in model information between the phoneme models of 
the pair of corresponding phoneme models is insignificant 

However, Neti fails to teach phoneme training data and phoneme models 

Yang teaches a Mandarin Chinese speech recognition apparatus comprises, a 
speech signal filter for receiving a speech signal and creating a filtered analogue signal, 
an analogue-to-digital (A/D) converter connected to the speech signal to a digital 
speech signal, a computer connected to the A/D converter for receiving and processing 
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the digital signal, a pitch frequency detector connected to the computer for detecting 
characteristics of the pitch frequency of the speech signal thereby recognizing tone in 
the speech signal, a speech signal pre-processor connected to the computer for 
detecting the endpoints of syllables of speech signals thereby defining a beginning and 
ending of a syllable, and a training portion connected to the computer for training an 
initial part PSV model and a final part PSV model and for training a syllable model 
based on trained parameters of the initial part PSV model and the final part PSV model 
(Yang [0016]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Neti to incorporate phoneme training data 
and phoneme models as taught by Yang to allow for defining a beginning and ending of 
a syllable, wherein characteristics such as pitch and tone are used to find differences 
between phonemes (Yang [0016]) in both male and female voices. 

However, Neti in view of Yang fails to teach the difference in model information 
between the phoneme models of the pair of corresponding phoneme models is 
insignificant. 

Kanevsky teaches the Kullback-Leibler distance (Kanevsky Col. 5, lines 9-1 1 ) 
between any two topics is at least h, where h ~s some sufficiently large threshold, also 
Kanevsky teaches (Kanevsky Col. 12, lines 44-47) that while using the Kullback-Leibler 
distance, one can check which pairs of topics are sufficiently separated from each other, 
and that topics that are close in this metric could be combined together). 
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Kanevsky also explicitly teaches how a difference is sufficient, such as 
classifying data groups when compared, and also creating independence from 
classificaiton if there is no topic discovered (Kanevsky Col. 5 lines 8-25). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Neti in view of Yang to incorporate the 
difference in model information between the phoneme models of the pair of 
corresponding phoneme models is insignificant as taught by Kanevsky to allow for an 
improved language modeling for automatic speech decoding and differentiation 
between data groups, wherein a sufficiently large threshold indicates either separate or 
combinational probabilities (Kanevsky Col. 2, lines 50-52). 

Re claims 2, 7, and 12, Neti teaches the method at least one computer readable 
medium of claim 1 , further comprising removing each of the phoneme models of the pair 
of corresponding phoneme models from the female speech recognition model and the 
male speech recognition model (Col. 5 lines 9-21 , Fig. 4 & 5, processor 44 outputs 
recognized speech based on female dependent models, male dependent models, and 
male and female independent models 46 ad 48) when the difference in model 
information between the phoneme models is insignificant (Col. 1 lines 33-47). 

Note: Examiner finds support for the act of "removing" such as "the processor 
108 removes the separate female models 110 and male models 112 that are 
determined to have insignificant differences from one another. The final result from the 
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processor 108 contains female models 110 derived from female training data 104, male 
models 112 derived from male training data 106, and gender independent models 114 
derived from both the female and male training data 104 and 5 106, wherein the female 
models 110 and male models 112 are significantly different from each other" (present 
invention spec, page 3 line 30 - page 4 line 6). 

However, Neti fails to teach phoneme training data and phoneme models 
Yang teaches a Mandarin Chinese speech recognition apparatus comprises, a 
speech signal filter for receiving a speech signal and creating a filtered analogue signal, 
an analogue-to-digital (A/D) converter connected to the speech signal to a digital 
speech signal, a computer connected to the A/D converter for receiving and processing 
the digital signal, a pitch frequency detector connected to the computer for detecting 
characteristics of the pitch frequency of the speech signal thereby recognizing tone in 
the speech signal, a speech signal pre-processor connected to the computer for 
detecting the endpoints of syllables of speech signals thereby defining a beginning and 
ending of a syllable, and a training portion connected to the computer for training an 
initial part PSV model and a final part PSV model and for training a syllable model 
based on trained parameters of the initial part PSV model and the final part PSV model 
(Yang [0016]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Neti to incorporate phoneme training data 
and phoneme models as taught by Yang to allow for defining a beginning and ending of 
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a syllable, wherein characteristics such as pitch and tone are used to find differences 
between phonemes (Yang [0016]) in both male and female voices. 

However, Neti in view of Yang fails to teach the difference in model information 
between the phoneme models of the pair of corresponding phoneme models is 
insignificant. 

Kanevsky teaches the Kullback-Leibler distance (Kanevsky Col. 5, lines 9-1 1 ) 
between any two topics is at least h, where h ~s some sufficiently large threshold, also 
Kanevsky teaches (Kanevsky Col. 1 2, lines 44-47) that while using the Kullback-Leibler 
distance, one can check which pairs of topics are sufficiently separated from each other, 
and that topics that are close in this metric could be combined together). 

Kanevsky also explicitly teaches how a difference is sufficient, such as 
classifying data groups when compared, and also creating independence from 
classificaiton if there is no topic discovered (Kanevsky Col. 5 lines 8-25). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Neti in view of Yang to incorporate the 
difference in model information between the phoneme models of the pair of 
corresponding phoneme models is insignificant as taught by Kanevsky to allow for an 
improved language modeling for automatic speech decoding and differentiation 
between data groups, wherein a sufficiently large threshold indicates either separate or 
combinational probabilities (Kanevsky Col. 2, lines 50-52). 
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Re claims 3, 8, and 1 3, Neti in view of Yang fails to teach the method of claim 1 , 
wherein determining the difference in model information includes calculating a Kullback 
Leibler distance between the first speech recognition model and second speech 
recognition model. 

Kanevsky et al. teaches that for two different sets, one can define a Kullback- 
Leibler distance using the frequencies of the sets. [With the distance] one can check 
which pairs of topics are sufficiently separated from each other. Topics that are close in 
this metric could be combined together (Kanevsky Col. 12, lines 42-47). 

Kanevsky also explicitly teaches how a difference is sufficient, such as 
classifying data groups when compared, and also creating independence from 
classificaiton if there is no topic discovered (Kanevsky Col. 5 lines 8-25). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Neti in view of Yang to incorporate the 
determining the difference in model information includes calculating a Kullback Leibler 
distance between the first speech recognition model and second speech recognition 
model as taught by Kanevsky to allow for an improved language modeling for automatic 
speech decoding and differentiation between data groups, wherein a sufficiently large 
threshold indicates either separate or combinational probabilities (Kanevsky Col. 2, lines 
50-52). 
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Re claims 4, 9, and 14, Neti in view of Yang fails to teach the method of claim 3, 
wherein whether the model information is insignificant is based on a threshold Kullback 
Leibler distance quantity. 

Kanevsky teaches the Kullback-Leibler distance (Kanevsky Col. 5, lines 9-1 1 ) 
between any two topics is at least h, where h ~s some sufficiently large threshold, also 
Kanevsky teaches (Kanevsky Col. 12, lines 44-47) that while using the Kullback-Leibler 
distance, one can check which pairs of topics are sufficiently separated from each other, 
and that topics that are close in this metric could be combined together). 

Kanevsky also explicitly teaches how a difference is sufficient, such as 
classifying data groups when compared, and also creating independence from 
classificaiton if there is no topic discovered (Kanevsky Col. 5 lines 8-25). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Neti in view of Yang to incorporate whether 
the model information is insignificant is based on a threshold Kullback Leibler distance 
quantity as taught by Kanevsky to allow for an improved language modeling for 
automatic speech decoding and differentiation between data groups, wherein a 
sufficiently large threshold indicates either separate or combinational probabilities 
(Kanevsky Col. 2, lines 50-52). 

Re claims 5, 1 0, and 1 5, Neti teaches method of claim 1 , wherein the female 
speech recognition model, male speech recognition model, and gender-independent 
speech recognition model are Gaussian mixture models (Neti Col. 3 lines 50-67). 
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4. Claims 17-27 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Neti et al. US 5953701 A (hereinafter Neti) in view of Wark US 20030231775 
(hereinafter Wark) and further in view of Yang US 20010010039 A1 (hereinafter 
Yang). 

Re claims 17, 21 , and 24, Neti teaches Wark teaches a system for recognizing 
speech data from an audio stream originating from one of a plurality of data classes 
([0094]) system comprising: 

a computer processor (Col. 6 lines 24-49); 

a receiving module configured to receive a current feature vector of the audio 
stream (Col. 6 lines 24-49); 

a first computing module configured to compute a current vector probability (Col. 
3 lines 50-67) that the current feature vector belongs to one of the plurality of data 
classes (Col. 5 lines 9-21); 

wherein the plurality of data classes include a first speech recognition model 
based on recorded phonemes originating from a first set of speakers, a second speech 
recognition model based on recorded phonemes from a second set of speakers, and a 
third speech recognition model that includes phoneme models based on pairs of 
corresponding recorded phonemes originating from both the first and second set of 
speakers having insignificant differences in model information between the recorded 
phonemes of the pair of corresponding recorded phonemes (Col. 5 lines 9-21, Fig. 4 & 
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5), each of the first speech recognition model and the second speech recognition model 
lacking the phoneme models of the third speech recognition model based on pairs of 
corresponding recorded phonemes originating from both the first and second set of 
speakers having insignificant differences in model information between the recorded 
phonemes of the pairs of corresponding recorded phonemes (Col. 1 lines 33-47 & Fig. 
4). 

However, Neti fails to teach a second computing module configured to compute 
an accumulated confidence level that the audio stream belongs to one of the plurality of 
data classes based on the current vector probability and on previous vector 
probabilities; 

a weighing module configured to weigh class models based on the accumulated 
confidence; and 

a recognizing module configured to recognize the current feature vector (based 
on the weighted class models; and 

Wark teaches classification of homogeneous segments, a number of statistical 
features are extracted from each segment. Whilst previous systems extract from each 
segment a feature vector, and then classify the segments based on the distribution of 
the feature vectors, method 200 divides each homogenous segment into a number of 
smaller sub-segments, or clips hereinafter, with each clip large enough to extract a 
meaningful feature vector f from the clip. The clip feature vectors f are then used to 
classify the segment from which it is extracted based on the characteristics of the 
distribution of the clip feature vectors f. The key advantage of extracting a number of 
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feature vectors f from a series of smaller clips rather than a single feature vector for a 
whole segment is that the characteristics of the distribution of the feature vectors f over 
the segment of interest may be examined. Thus, whilst the signal characteristics over 
the length of the segment are expected to be reasonably consistent, by virtue of the 
segmentation algorithm, some important characteristics in the distribution of the feature 
vectors f over the segment of interest is significant for classification purposes (Wark 
[0094]) 

Further, Wark teaches the ability to decide whether the segment should be 
assigned the label of the class with the highest score, or labeled us "unknown", a 
confidence score is calculated. This is achieved by taking the difference of the top two 
model scores .sub.p and .sub.q, and normalizing that difference by the distance 
measure D.sub.pq between their class models p and q. This is based on the premise 
that an easily identifiable segment should be a lot closer to the model it belongs to than 
the next closest model. With further apart models, the model scores .sub.c should also 
be well separated before the segment is assigned the class label of the class with the 
highest score (Wark [0146] & Fig. 4, adjacent, previous and current segment/frame). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Neti to incorporate a second computing 
module configured to compute an accumulated confidence level that the audio stream 
belongs to one of the plurality of data classes based on the current vector probability 
and on previous vector probabilities, a weighing module configured to weigh class 
models based on the accumulated confidence and a recognizing module configured to 
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recognize the current feature vector (based on the weighted class models as taught by 
Wark to allow for normalization of a difference by a distance, whereby an easily 
identifiable segment should be a lot closer to the model it belongs to than the next 
closest model (Wark [0146]), wherein a confidence score or score is used to better 
classify speech, whereby segments of feature vectors are classified, making important 
characteristics in adjacent, current, and previous frames in the distribution of the feature 
vectors more apparent (Wark [0094]). 



However, Neti in view of Wark fails to teach phoneme training data and phoneme 
models 

Yang teaches a Mandarin Chinese speech recognition apparatus comprises, a 
speech signal filter for receiving a speech signal and creating a filtered analogue signal, 
an analogue-to-digital (A/D) converter connected to the speech signal to a digital 
speech signal, a computer connected to the A/D converter for receiving and processing 
the digital signal, a pitch frequency detector connected to the computer for detecting 
characteristics of the pitch frequency of the speech signal thereby recognizing tone in 
the speech signal, a speech signal pre-processor connected to the computer for 
detecting the endpoints of syllables of speech signals thereby defining a beginning and 
ending of a syllable, and a training portion connected to the computer for training an 
initial part PSV model and a final part PSV model and for training a syllable model 
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based on trained parameters of the initial part PSV model and the final part PSV model 
(Yang [0016]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Neti in view of Wark to incorporate 
phoneme training data and phoneme models as taught by Yang to allow for defining a 
beginning and ending of a syllable, wherein characteristics such as pitch and tone are 
used to find differences between phonemes (Yang [0016]) in both male and female 
voices. 

Re claims 18, 22, and 25, Neti teaches the method of claim 17, wherein 
computing the current vector probability includes estimating a posteriori class probability 
for the current feature vector (Col. 2 lines 1-8)) 

Re claims 1 9, 23, and 26, Neti fails to teach the method of claim 1 7, wherein 
computing the accumulated confidence level further comprising weighing the current 
vector probability more than the previous vector probabilities. 

Wark teaches classification of homogeneous segments, a number of statistical 
features are extracted from each segment. Whilst previous systems extract from each 
segment a feature vector, and then classify the segments based on the distribution of 
the feature vectors, method 200 divides each homogenous segment into a number of 
smaller sub-segments, or clips hereinafter, with each clip large enough to extract a 
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meaningful feature vector f from the clip. The clip feature vectors f are then used to 
classify the segment from which it is extracted based on the characteristics of the 
distribution of the clip feature vectors f. The key advantage of extracting a number of 
feature vectors f from a series of smaller clips rather than a single feature vector for a 
whole segment is that the characteristics of the distribution of the feature vectors f over 
the segment of interest may be examined. Thus, whilst the signal characteristics over 
the length of the segment are expected to be reasonably consistent, by virtue of the 
segmentation algorithm, some important characteristics in the distribution of the feature 
vectors f over the segment of interest is significant for classification purposes (Wark 
[0094]) 

Further, Wark teaches the ability to decide whether the segment should be 
assigned the label of the class with the highest score, or labeled us "unknown", a 
confidence score is calculated. This is achieved by taking the difference of the top two 
model scores .sub.p and .sub.q, and normalizing that difference by the distance 
measure D.sub.pq between their class models p and q. This is based on the premise 
that an easily identifiable segment should be a lot closer to the model it belongs to than 
the next closest model. With further apart models, the model scores .sub.c should also 
be well separated before the segment is assigned the class label of the class with the 
highest score (Wark [0146] & Fig. 4, adjacent, previous and current segment/frame). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Neti to incorporate computing the 
accumulated confidence level further comprising weighing the current vector probability 
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more than the previous vector probabilities as taught by Wark to allow for normalization 
of a difference by a distance, whereby an easily identifiable segment should be a lot 
closer to the model it belongs to than the next closest model (Wark [0146]), wherein a 
confidence score or score is used to better classify speech, whereby segments of 
feature vectors are classified, making important characteristics in adjacent, current, and 
previous frames in the distribution of the feature vectors more apparent (Wark [0094]). 

Re claims 20 and 27, Neti teaches the method of claim 17, further comprising 
determining if another feature vector is available for analysis (Col. 6 lines 24-49). 



Conclusion 



Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Michael C. Colucci whose telephone number is (571)- 
270-1847. The examiner can normally be reached on 9:30 am - 6:00 pm, Monday- 
Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571)-272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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